A Unifying Approach to HTML Wrapper Representation and Learning

نویسندگان

  • Gunter Grieser
  • Klaus P. Jantke
  • Steffen Lange
  • Bernd Thomas
چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Unifying Approach to HTML

The number, the size, and the dynamics of Internet information sources bears abundant evidence of the need for automation in information extraction. This calls for representation formalisms that match the World Wide Web reality and for learning approaches and learnability results that apply to these formalisms. The concept of elementary formal systems is appropriately generalized to allow for t...

متن کامل

Extracting Partial Structures from HTML Documents

The new wrapper model for extracting text data from HTML documents is introduced. In this model, an HTML file is considered as an ordered labeled tree. The learning algorithm takes the sequence of pairs of an HTML tree and a set of nodes The nodes indicate the labels to extract from the HTML tree. The goal of the learning algorithm is to output the wrapper which exactly extracts the labels from...

متن کامل

Bridging the semantic gap for software effort estimation by hierarchical feature selection techniques

Software project management is one of the significant activates in the software development process. Software Development Effort Estimation (SDEE) is a challenging task in the software project management. SDEE is an old activity in computer industry from 1940s and has been reviewed several times. A SDEE model is appropriate if it provides the accuracy and confidence simultaneously before softwa...

متن کامل

Object-Oriented Web-Based Courses Development through XML

Web-based methodology has become a new paradigm for constructing computer assisted learning system. While HTML is a data representation language; so HTML-based courseware is machine-readable but not machine-understandable. The lack of suitable abstraction makes it difficult to construct frameworks for retrieving reusable pieces from HTML documents of different Web-based courses. Therefore, the ...

متن کامل

Interactive Learning of HTML Wrappers Using Attribute Classification

Reviewing the current HTML wrapping systems, it is possible to recognise two mainstream categories. The first category are systems based on various machine learning techniques with lower roll-out and maintenance costs, but reaching worse results and usually being specialised to particular domains. The second category are systems which allow to build more complicated wrapping solutions. But here...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000